random element
- North America > Canada > Ontario > Toronto (0.15)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- North America > Canada > Ontario > Toronto (0.15)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
Optimal Oblivious Subspace Embeddings with Near-optimal Sparsity
Chenakkod, Shabarish, Dereziński, Michał, Dong, Xiaoyu
An oblivious subspace embedding is a random $m\times n$ matrix $\Pi$ such that, for any $d$-dimensional subspace, with high probability $\Pi$ preserves the norms of all vectors in that subspace within a $1\pm\epsilon$ factor. In this work, we give an oblivious subspace embedding with the optimal dimension $m=\Theta(d/\epsilon^2)$ that has a near-optimal sparsity of $\tilde O(1/\epsilon)$ non-zero entries per column of $\Pi$. This is the first result to nearly match the conjecture of Nelson and Nguyen [FOCS 2013] in terms of the best sparsity attainable by an optimal oblivious subspace embedding, improving on a prior bound of $\tilde O(1/\epsilon^6)$ non-zeros per column [Chenakkod et al., STOC 2024]. We further extend our approach to the non-oblivious setting, proposing a new family of Leverage Score Sparsified embeddings with Independent Columns, which yield faster runtimes for matrix approximation and regression tasks. In our analysis, we develop a new method which uses a decoupling argument together with the cumulant method for bounding the edge universality error of isotropic random matrices. To achieve near-optimal sparsity, we combine this general-purpose approach with new traces inequalities that leverage the specific structure of our subspace embedding construction.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Research Report (0.64)
- Workflow (0.45)
Explanation sensitivity to the randomness of large language models: the case of journalistic text classification
Bogaert, Jeremie, de Marneffe, Marie-Catherine, Descampe, Antonin, Escouflaire, Louis, Fairon, Cedrick, Standaert, Francois-Xavier
Large language models (LLMs) perform very well in several natural language processing tasks but raise explainability challenges. In this paper, we examine the effect of random elements in the training of LLMs on the explainability of their predictions. We do so on a task of opinionated journalistic text classification in French. Using a fine-tuned CamemBERT model and an explanation method based on relevance propagation, we find that training with different random seeds produces models with similar accuracy but variable explanations. We therefore claim that characterizing the explanations' statistical distribution is needed for the explainability of LLMs. We then explore a simpler model based on textual features which offers stable explanations but is less accurate. Hence, this simpler model corresponds to a different tradeoff between accuracy and explainability. We show that it can be improved by inserting features derived from CamemBERT's explanations. We finally discuss new research directions suggested by our results, in particular regarding the origin of the sensitivity observed in the training randomness.
- Europe > Belgium > Wallonia > Walloon Brabant > Louvain-la-Neuve (0.04)
- Europe > France (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
Sharp bounds for max-sliced Wasserstein distances
We obtain essentially matching upper and lower bounds for the expected max-sliced 1-Wasserstein distance between a probability measure on a separable Hilbert space and its empirical distribution from $n$ samples. By proving a Banach space version of this result, we also obtain an upper bound, that is sharp up to a log factor, for the expected max-sliced 2-Wasserstein distance between a symmetric probability measure $\mu$ on a Euclidean space and its symmetrized empirical distribution in terms of the norm of the covariance matrix of $\mu$ and the diameter of the support of $\mu$.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Michigan > Ingham County > Lansing (0.04)
- North America > United States > Michigan > Ingham County > East Lansing (0.04)
Consistency of some sequential experimental design strategies for excursion set estimation based on vector-valued Gaussian processes
Stange, Philip, Ginsbourger, David
We tackle the extension to the vector-valued case of consistency results for Stepwise Uncertainty Reduction sequential experimental design strategies established in [Bect et al., A supermartingale approach to Gaussian process based sequential design of experiments, Bernoulli 25, 2019]. This lead us in the first place to clarify, assuming a compact index set, how the connection between continuous Gaussian processes and Gaussian measures on the Banach space of continuous functions carries over to vector-valued settings. From there, a number of concepts and properties from the aforementioned paper can be readily extended. However, vector-valued settings do complicate things for some results, mainly due to the lack of continuity for the pseudo-inverse mapping that affects the conditional mean and covariance function given finitely many pointwise observations. We apply obtained results to the Integrated Bernoulli Variance and the Expected Measure Variance uncertainty functionals employed in [Fossum et al., Learning excursion sets of vector-valued Gaussian random fields for autonomous ocean sampling, The Annals of Applied Statistics 15, 2021] for the estimation for excursion sets of vector-valued functions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland > Bern > Bern (0.04)
- (2 more...)
Understanding black-box models with dependent inputs through a generalization of Hoeffding's decomposition
Idrissi, Marouane Il, Bousquet, Nicolas, Gamboa, Fabrice, Iooss, Bertrand, Loubes, Jean-Michel
One of the main challenges for interpreting black-box models is the ability to uniquely decompose square-integrable functions of non-mutually independent random inputs into a sum of functions of every possible subset of variables. However, dealing with dependencies among inputs can be complicated. We propose a novel framework to study this problem, linking three domains of mathematics: probability theory, functional analysis, and combinatorics. We show that, under two reasonable assumptions on the inputs (non-perfect functional dependence and non-degenerate stochastic dependence), it is always possible to decompose uniquely such a function. This ``canonical decomposition'' is relatively intuitive and unveils the linear nature of non-linear functions of non-linearly dependent inputs. In this framework, we effectively generalize the well-known Hoeffding decomposition, which can be seen as a particular case. Oblique projections of the black-box model allow for novel interpretability indices for evaluation and variance decomposition. Aside from their intuitive nature, the properties of these novel indices are studied and discussed. This result offers a path towards a more precise uncertainty quantification, which can benefit sensitivity analyses and interpretability studies, whenever the inputs are dependent. This decomposition is illustrated analytically, and the challenges to adopting these results in practice are discussed.
- North America > United States > Massachusetts > Suffolk County > Boston (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- (9 more...)
Quantitative CLTs in Deep Neural Networks
Favaro, Stefano, Hanin, Boris, Marinucci, Domenico, Nourdin, Ivan, Peccati, Giovanni
We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Latent Multimodal Functional Graphical Model Estimation
Tsai, Katherine, Zhao, Boxin, Koyejo, Sanmi, Kolar, Mladen
Joint multimodal functional data acquisition, where functional data from multiple modes are measured simultaneously from the same subject, has emerged as an exciting modern approach enabled by recent engineering breakthroughs in the neurological and biological sciences. One prominent motivation to acquire such data is to enable new discoveries of the underlying connectivity by combining multimodal signals. Despite the scientific interest, there remains a gap in principled statistical methods for estimating the graph underlying multimodal functional data. To this end, we propose a new integrative framework that models the data generation process and identifies operators mapping from the observation space to the latent space. We then develop an estimator that simultaneously estimates the transformation operators and the latent graph. This estimator is based on the partial correlation operator, which we rigorously extend from the multivariate to the functional setting. Our procedure is provably efficient, with the estimator converging to a stationary point with quantifiable statistical error. Furthermore, we show recovery of the latent graph under mild conditions. Our work is applied to analyze simultaneously acquired multimodal brain imaging data where the graph indicates functional connectivity of the brain. We present simulation and empirical results that support the benefits of joint estimation.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
- (2 more...)
Random Inverse Problems Over Graphs: Decentralized Online Learning
We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)